Skip to content

Conversation

@vicLin8712
Copy link
Collaborator

@vicLin8712 vicLin8712 commented Nov 19, 2025

O(1) scheduler: Complete implementation

This PR provides the complete O(1) scheduler implementation and serves as the final part of the 3-part stacked PR series.
It integrates all components introduced in the earlier patches and replaces the legacy O(n) linear scheduler with the new ready-queue–based, RR-cursor-based, bitmap-assisted O(1) design.

Feature of O(1) scheduler

  • Priority-indexed ready queues
    Each priority level maintains an independent ready queue.

  • Bitmap + De Bruijn–based highest-priority lookup
    The scheduler locates the next runnable task in constant time using priority bitmaps and De Bruijn table lookup.

  • RR cursor for fair round-robin scheduling
    Each priority queue maintains a cursor to provide O(1) fair scheduling among tasks of the same priority.

  • Full integration into the scheduler execution path
    The legacy O(n) priority scanning algorithm is completely replaced by the new O(1) logic; the iteration limit IMAX=500 is removed.

  • Idle task fully integrated into new design
    System execution starts in the idle task, which serves as the initial execution context.
    Whenever the idle task yields, control deterministically transitions to the highest-priority runnable task.

Unit tests

Commit: Add unit test suite for RR-cursor scheduler

make sched_cmp
make run

Approach

  • A dedicated controller task is created with priority TASK_PRIO_CRIT to orchestrate the entire test process and enforce deterministic sequencing.
  • After each state change of the test tasks, the unit test verifies both bitmap correctness and per-priority task-count consistency, ensuring alignment with the ready-queue and priority-bitmask invariants maintained by the O(1) scheduler.
    Task types
  • Controller task: Coordinates the test flow, triggers all state transitions, and validates ready-queue invariants after each step.
  • Delay task: A runnable task that transitions into TASK_BLOCKED through mo_task_delay().
    Used to verify dequeue behavior and correct clearing of priority bits when a task leaves the schedulable state set.
  • Normal task: A simple infinite-loop runnable task that remains schedulable unless externally suspended or cancelled.
    Serves as the primary subject for testing state transitions and enqueue/dequeue correctness.

Verified state points
The following state transitions are validated by checking both ready-queue task counts and bitmap updates after each operation:

  • Normal task state transitions

    • Creation (TASK_READY) – initial enqueue and priority bit set.
    • Priority change – priority migration updates to queue placement and the corresponding bitmap bit.
    • Suspension (TASK_READYTASK_SUSPEND) – dequeued from the ready queue and priority bit cleared.
    • Resumption (TASK_SUSPENDTASK_READY) – re-enqueued with correct priority placement.
    • Cancellation (TASK_READYTASK_CANCELLED) – removed from ready queues and all bitmap bits fully cleared.
  • Blocked task behavior (TASK_RUNNINGTASK_BLOCKED)

    • The delay task is created and its priority is promoted to match the controller task’s priority (TASK_READY).
    • After the controller yields, the delay task becomes the running task, invokes mo_task_delay(), and transitions to TASK_BLOCKED.
    • Control returns to the controller task, and the test verifies:
      • the delay task is completely removed from the ready queue
      • its priority bit is cleared from the bitmap
      • scheduler selection falls back to the highest remaining runnable task

Results

Linmo kernel is starting...
Heap initialized, 130005992 bytes available
idle id 1: entry=80001900 stack=80004488 size=4096
task 2: entry=80000788 stack=80005508 size=4096 prio_level=4 time_slice=5
Scheduler mode: Preemptive
Starting RR-cursor based scheduler test suits...

=== Testing Bitmap and Task Count Consistency ===
task 3: entry=80000168 stack=80006634 size=4096 prio_level=4 time_slice=5
PASS: Bitmap is consistent when TASK_READY
PASS: Task count is consistent when TASK_READY
PASS: Bitmap is consistent when priority migration
PASS: Task count is consistent when priority migration
PASS: Bitmap is consistent when TASK_SUSPENDED
PASS: Task count is consistent when TASK_SUSPENDED
PASS: Bitmap is consistent when TASK_READY from TASK_SUSPENDED
PASS: Task count is consistent when TASK_READY from TASK_SUSPENDED
PASS: Bitmap is consistent when task canceled
PASS: Task count is consistent when task canceled
task 4: entry=80000178 stack=80006634 size=4096 prio_level=4 time_slice=5
PASS: Task count is consistent when task canceled
PASS: Task count is consistent when task blocked

=== Test Results ===
Tests passed: 12
Tests failed: 0
Total tests: 12
All tests PASSED!
RR-cursor based scheduler tests completed successfully.

Note

  1. The term TASK_CANCELLED in this document is used only for explanation. It is not an actual state in the task state machine, but represents the condition where a task has been removed from all scheduling structures and no longer exists in the system.
  2. The task states shown in parentheses (e.g., (TASK_READY)) refer to the state of the test tasks being created or manipulated, not the state of the controller task.

Benchmark

Commit: Add benchmarking files

python3 bench.py

Approach

  1. Spawn N=500 normal tasks to populate the scheduling domain.
    All tasks begin in the TASK_READY state, ensuring the ready queues and bitmap are fully populated.
  2. Scenario configuration (active ratio)
    For each benchmark scenario, suspend a portion of tasks to reach the desired active-ratio load:
  • 2% active
  • 4% active
  • 20% active
  • 50% active
  • 100% active
  1. Benchmark execution
    To compare the legacy O(n) scheduler with the new O(1) scheduler, a compile-time flag OLD is passed to select which scheduling algorithm is active.
    The original linear-search scheduler is preserved in task.c for baseline measurement.
    For each benchmark scenario, the scheduler is executed 20 times to obtain stable timing data.
    The average and maximum scheduling latencies are collected, and the performance improvement is computed as the ratio between the old and new scheduler times (e.g., 1.5× faster).

  2. Metrics collected
    The benchmark collects the following metrics for each scenario:

    • Mean improvement
      Average speedup factor computed as (old_latency / new_latency) across 20 runs.

    • Standard deviation of improvement
      Measures the variability of speedup across repeated runs.

    • Minimum / maximum improvement
      Best and worst observed speedup factors among the 20 runs.

    • 95% confidence interval (CI)
      Statistical confidence bounds for the mean improvement.

    • Mean scheduling latency (old / new)
      Average schedule-selection time for both the legacy O(n) scheduler and the new O(1) scheduler.

    • Maximum scheduling latency (old / new)
      Worst-case schedule-selection time observed for each scheduler.
      Results

Scenario 'Minimal Active':                                                                                                                                                                                                 
  mean improvement        = 2.68x faster                                                                                                                                                                                   
  std dev of improvement  = 0.34x                                                                                                                                                                                          
  min / max improvement   = 1.75x  /  3.35x                                                                                                                                                                                
  95% CI of improvement   = [2.54x, 2.83x]                                                                                                                                                                                 
  mean old sched time     = 5616.25 us                                                                                                                                                                                     
  mean new sched time     = 2119.0 us                                                                                                                                                                                      
  max  old sched time     = 47.0 us 
  max  new sched time     = 37.0 us 

Scenario 'Moderate Active':
  mean improvement        = 1.80x faster
  std dev of improvement  = 0.27x
  min / max improvement   = 1.27x  /  2.51x
  95% CI of improvement   = [1.68x, 1.92x]
  mean old sched time     = 3887.6 us 
  mean new sched time     = 2179.45 us 
  max  old sched time     = 40.0 us 
  max  new sched time     = 23.0 us 

Scenario 'Heavy Active':
  mean improvement        = 1.02x faster
  std dev of improvement  = 0.08x
  min / max improvement   = 0.84x  /  1.17x
  95% CI of improvement   = [0.98x, 1.06x]
  mean old sched time     = 2150.15 us 
  mean new sched time     = 2119.1 us 
  max  old sched time     = 73.0 us 
  max  new sched time     = 33.0 us 

Scenario 'Stress Test':
  mean improvement        = 0.93x (slower than OLD)
  std dev of improvement  = 0.11x
  min / max improvement   = 0.65x  /  1.20x
  95% CI of improvement   = [0.88x, 0.98x]
  mean old sched time     = 1874.35 us 
  mean new sched time     = 2032.55 us 
  max  old sched time     = 23.0 us 
  max  new sched time     = 20.0 us 

Scenario 'Full Load Test':
  mean improvement        = 0.89x (slower than OLD)
  std dev of improvement  = 0.11x
  min / max improvement   = 0.63x  /  1.07x
  95% CI of improvement   = [0.84x, 0.94x]
  mean old sched time     = 1798.8 us 
  mean new sched time     = 2048.55 us 
  max  old sched time     = 33.0 us 
  max  new sched time     = 52.0 us
image image

Reference

#23 - Draft discussion
#36 - Infrustrue
#37 - Task state transitions APIs
ae35c84 - unit test test suite
11e9ee6 - benchmark


Summary by cubic

Complete O(1) scheduler with priority queues, bitmap lookup, and RR cursors, replacing the legacy O(n) scan. Adds an idle task and updates task lifecycle to use ready queues; up to ~2.7x faster under light load.

  • New Features

    • Priority-indexed ready queues with O(1) highest-priority selection via bitmap + De Bruijn lookup.
    • Per-priority round-robin cursors for fair rotation without list churn.
    • Scheduler state in kcb (ready_bitmap, ready_queues[], rr_cursors[], queue_counts[]) and a dedicated idle task as the safe fallback.
    • Intrusive ready-queue design: TCB embeds rq_node; helpers list_pushback_node() and list_remove_node() manage nodes safely.
    • Unit tests validate bitmap/queue invariants; benchmarks show strong gains at low/moderate activity.
  • Refactors

    • Tasks explicitly enqueue/dequeue on READY/RUNNING transitions (spawn, resume, wakeup, delay, block, suspend, cancel).
    • Semaphore signal uses sched_wakeup_task() to reinsert tasks into ready queues.
    • Priority changes migrate tasks between queues and yield if the running task changes its priority.
    • Startup launches into the idle task (idle_task_init) and removes the IMAX scan limit.

Written for commit 0d8c856. Summary will update automatically on new commits.

@vicLin8712 vicLin8712 mentioned this pull request Nov 19, 2025
@jserv jserv changed the title [3/3] O(1) scheduler: Complete implementation O(1) scheduler: Complete implementation Nov 19, 2025
@jserv
Copy link
Contributor

jserv commented Nov 19, 2025

Do not include numbers in pull-request titles.

This commit extends the core scheduler data structures to support
the new O(1) scheduler design.

Adds in tcb_t:

 - rq_node: embedded list node for ready-queue membership used
   during task state transitions. This avoids redundant malloc/free
   for per-enqueue/dequeue nodes by tying the node's lifetime to
   the task control block.

Adds in kcb_t:

 - ready_bitmap: 8-bit bitmap tracking which priority levels have
   runnable tasks.
 - ready_queues[]: per-priority ready queues for O(1) task
   selection.
 - queue_counts[]: per-priority runnable task counters used for
   bookkeeping and consistency checks.
 - rr_cursors[]: round-robin cursor per priority level to support
   fair selection within the same priority.

These additions are structural only and prepare the scheduler for
O(1) ready-queue operations; they do not change behavior yet.
When a task is enqueued into or dequeued from the ready queue, the
bitmap that indicates the ready queue state should be updated.

These three marcos can be used in mo_task_dequeue() and
mo_task_enqueue() APIs to improve readability and maintain
consistency.
This commit introduces two helper functions for intrusive list
usage, where each task embeds its own list node instead of relying
on per-operation malloc/free.

The new APIs allow the scheduler to manipulate ready-queue nodes
directly:

 - list_pushback_node(): append an existing node to the end of the
   list (before the tail sentinel) without allocating memory.

 - list_remove_node(): remove a node from the list without freeing
   it, allowing the caller to control the node's lifetime.

These helpers will be used by the upcoming O(1) scheduler
enqueue/dequeue paths, which require embedded list nodes stored in
tcb_t.
This commit refactors sched_enqueue_task() and
sched_dequeue_task() to use the per-priority ready queues and the
embedded rq_node stored in tcb_t, instead of relying only on task
state inspection.

Tasks are now explicitly added to and removed from the appropriate
ready queue, and queue_counts, rr_cursors, and the ready_bitmap
are updated accordingly.
This commit introduces a new API, sched_migrate_task(), which enables
migration of a task between ready queues of different priority levels.

The function safely removes the task from its current ready queue and
enqueues it into the target queue, updating the corresponding RR cursor
and ready bitmap to maintain scheduler consistency. This helper will be
used in mo_task_priority() and other task management routines that
adjust task priority dynamically.

Future improvement:
The current enqueue path allocates a new list node for each task
insertion based on its TCB pointer. In the future, this can be optimized
by directly transferring or reusing the existing list node between
ready queues, eliminating the need for an additional malloc() and free()
operations during priority migrations.
This change refactors the priority update process in mo_task_priority()
to include early-return checks and proper task migration handling.

- Early-return conditions:
  * Prevent modification of the idle task.
  * Disallow assigning TASK_PRIO_IDLE to non-idle tasks.
  The idle task is created by idle_task_init() during system startup and
  must retain its fixed priority.

- Task migration:
  If the priority-changed task resides in a ready queue (TASK_READY or
  TASK_RUNNING), sched_migrate_task() is called to move it to the queue
  corresponding to the new priority.

- Running task behavior:
  When the current running task changes its own priority, it yields the
  CPU so the scheduler can dispatch the next highest-priority task.
This commit introduces the system idle task and its initialization API
(idle_task_init()). The idle task serves as the default execution
context when no other runnable tasks exist in the system.

The sched_idle() function supports both preemptive and cooperative
modes. In sched_t, a list node named task_idle is added to record the
idle task sentinel. The idle task never enters any ready queue and its
priority level cannot be changed.

When idle_task_init() is called, the idle task is initialized as the
first execution context. This eliminates the need for additional APIs
in main() to set up the initial high-priority task during system launch.
This design allows task priorities to be adjusted safely during
app_main(), while keeping the scheduler’s entry point consistent.
When all ready queues are empty, the scheduler should switch
to idle mode and wait for incoming interrupts. This commit
introduces a dedicated helper to handle that transition,
centralizing the logic and improving readbility of the
scheduler path to idle.
Prepare for O(1) bitmap index lookup by adding a 32-entry De Bruijn
sequence table. The table will be used in later commits to replace
iterative bit scanning. No functional change in this patch.
Implement the helper function that uses a De Bruijn multiply-and-LUT
approach to compute the index of the least-significant set bit in O(1)
time complexity.

This helper is not yet wired into the scheduler logic; integration
will follow in a later commit. No functional change in this patch.
Previously, sched_wakeup_task() was limited to internal use within
the scheduler module.
This change makes it globally visible so that it can be reused
in semaphore.c for task wake-up operations.
Previously, mo_sem_signal() only changed the awakened task state
to TASK_READY when a semaphore signal was triggered. In the new
scheduler design, which selects runnable tasks from ready queues,
the awakened task must also be enqueued for scheduling.

This change invokes sched_wakeup_task() to perform the enqueue
operation, ensuring the awakened task is properly inserted into
the ready queue.
Previously, mo_task_spawn() only created a task and appended it to the
global task list (kcb->tasks), assigning the first task directly from
the global list node.

This change adds a call to sched_enqueue_task() within the critical
section to enqueue the task into the ready queue and safely initialize
its scheduling attributes. The first task assignment is now aligned
with the RR cursor mechanism to ensure consistency with the O(1)
scheduler.
Previously, the scheduler iterated through the global task list
(kcb->tasks) to find the next TASK_READY task, resulting in O(N)
selection time. This approach limited scalability and caused
inconsistent task rotation under heavy load.

The new scheduling process:
1. Check the ready bitmap and find the highest priority level.
2. Select the RR cursor node from the corresponding ready queue.
3. Advance the selected cursor node circularly.

Why RR cursor instead of pop/enqueue rotation:
- Fewer operations on the ready queue: compared to the pop/enqueue
  approach, which requires two function calls per switch, the RR
  cursor method only advances one pointer per scheduling cycle.
- Cache friendly: always accesses the same cursor node, improving
  cache locality on hot paths.
- Cycle deterministic: RR cursor design allows deterministic task
  rotation and enables potential future extensions such as cycle
  accounting or fairness-based algorithms.

This change introduces a fully O(1) scheduler design based on
per-priority ready queues and round-robin (RR) cursors. Each ready
queue maintains its own cursor, allowing the scheduler to select
the next runnable task in constant time.
Previously, when all ready queues were empty, the scheduler
would trigger a kernel panic. This condition should instead
transition into the idle task rather than panic.

The new sched_switch_to_idle() helper centralizes this logic,
making the path to idle clearer and more readable.
Replace the iterative bitmap scanning with the De Bruijn multiply+LUT
method via the new helper. This change makes top-priority selection
constant-time and deterministic.
The idle task is now initialized in main() during system startup.
This ensures that the scheduler always has a valid execution context
before any user or application tasks are created. Initializing the
idle task early guarantees a safe fallback path when no runnable
tasks exist and keeps the scheduler entry point consistent.
This change sets up the scheduler state during system startup by
assigning kcb->task_current to kcb->harts->task_idle and dispatching
to the idle task as the first execution context.

This commit also keeps the scheduling entry path consistent between
startup and runtime.
Previously, both mo_task_spawn() and idle_task_init() implicitly
bound their created tasks to kcb->task_current as the first execution
context. This behavior caused ambiguity with the scheduler, which is
now responsible for determining the active task during system startup.

This change removes the initial binding logic from both functions,
allowing the startup process (main()) to explicitly assign
kcb->task_current (typically to the idle task) during launch.
This ensures a single, centralized initialization flow and improves
the separation between task creation and scheduling control.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants